AITopics | robustness testing

Collaborating Authors

robustness testing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Search-based Selection of Metamorphic Relations for Optimized Robustness Testing of Large Language Models

Hyun, Sangwon, Ali, Shaukat, Babar, M. Ali

arXiv.org Artificial IntelligenceJul-9-2025

Assessing the trustworthiness of Large Language Models (LLMs), such as robustness, has garnered significant attention. Recently, metamorphic testing that defines Metamorphic Relations (MRs) has been widely applied to evaluate the robustness of LLM executions. However, the MR-based robustness testing still requires a scalable number of MRs, thereby necessitating the optimization of selecting MRs. Most extant LLM testing studies are limited to automatically generating test cases (i.e., MRs) to enhance failure detection. Additionally, most studies only considered a limited test space of single perturbation MRs in their evaluation of LLMs. In contrast, our paper proposes a search-based approach for optimizing the MR groups to maximize failure detection and minimize the LLM execution cost. Moreover, our approach covers the combinatorial perturbations in MRs, facilitating the expansion of test space in the robustness assessment. We have developed a search process and implemented four search algorithms: Single-GA, NSGA-II, SPEA2, and MOEA/D with novel encoding to solve the MR selection problem in the LLM robustness testing. We conducted comparative experiments on the four search algorithms along with a random search, using two major LLMs with primary Text-to-Text tasks. Our statistical and empirical investigation revealed two key findings: (1) the MOEA/D algorithm performed the best in optimizing the MR space for LLM robustness testing, and (2) we identified silver bullet MRs for the LLM robustness testing, which demonstrated dominant capabilities in confusing LLMs across different Text-to-Text tasks. In LLM robustness assessment, our research sheds light on the fundamental problem for optimized testing and provides insights into search-based solutions.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2507.05565

Country:

North America > United States (0.04)
Asia (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (0.68)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards a framework on tabular synthetic data generation: a minimalist approach: theory, use cases, and limitations

Shen, Yueyang, Sudjianto, Agus, R, Arun Prakash, Bhattacharyya, Anwesha, Rao, Maorong, Wang, Yaqun, Vaughan, Joel, Zhou, Nengfeng

arXiv.org Machine LearningNov-19-2024

We propose and study a minimalist approach towards synthetic tabular data generation. The model consists of a minimalistic unsupervised SparsePCA encoder (with contingent clustering step or log transformation to handle nonlinearity) and XGboost decoder which is SOTA for structured data regression and classification tasks. We study and contrast the methodologies with (variational) autoencoders in several toy low dimensional scenarios to derive necessary intuitions. The framework is applied to high dimensional simulated credit scoring data which parallels real-life financial applications. We applied the method to robustness testing to demonstrate practical use cases. The case study result suggests that the method provides an alternative to raw and quantile perturbation for model robustness testing. We show that the method is simplistic, guarantees interpretability all the way through, does not require extra tuning and provide unique benefits.

autoencoder, perturbation, perturbation size, (15 more...)

arXiv.org Machine Learning

2411.10982

Country:

North America > United States > North Carolina (0.04)
North America > United States > Michigan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Banking & Finance > Credit (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Robustness Testing of Multi-Modal Models in Varied Home Environments for Assistive Robots

Hirlimann, Lea, Zhang, Shengqiang, Schütze, Hinrich, Wicke, Philipp

arXiv.org Artificial IntelligenceJun-19-2024

The development of assistive robotic agents to support household tasks is advancing, yet the underlying models often operate in virtual settings that do not reflect real-world complexity. For assistive care robots to be effective in diverse environments, their models must be robust and integrate multiple modalities. Consider a caretaker needing assistance in a dimly lit room or navigating around a newly installed glass door. Models relying solely on visual input might fail in low light, while those using depth information could avoid the door. This demonstrates the necessity for models that can process various sensory inputs. Our ongoing study evaluates state-of-the-art robotic models in the AI2Thor virtual environment. We introduce disturbances, such as dimmed lighting and mirrored walls, to assess their impact on modalities like movement or vision, and object recognition. Our goal is to gather input from the Geriatronics community to understand and model the challenges faced by practitioners.

disturbance, modality, robot, (15 more...)

arXiv.org Artificial Intelligence

2406.12443

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Robots > Robots in the Home (0.41)

Add feedback

Robustness Testing for Multi-Agent Reinforcement Learning: State Perturbations on Critical Agents

Zhou, Ziyuan, Liu, Guanjun

arXiv.org Artificial IntelligenceJun-8-2023

Multi-Agent Reinforcement Learning (MARL) has been widely applied in many fields such as smart traffic and unmanned aerial vehicles. However, most MARL algorithms are vulnerable to adversarial perturbations on agent states. Robustness testing for a trained model is an essential step for confirming the trustworthiness of the model against unexpected perturbations. This work proposes a novel Robustness Testing framework for MARL that attacks states of Critical Agents (RTCA). The RTCA has two innovations: 1) a Differential Evolution (DE) based method to select critical agents as victims and to advise the worst-case joint actions on them; and 2) a team cooperation policy evaluation method employed as the objective function for the optimization of DE. Then, adversarial state perturbations of the critical agents are generated based on the worst-case joint actions. This is the first robustness testing framework with varying victim agents. RTCA demonstrates outstanding performance in terms of the number of victim agents and destroying cooperation policies.

multi-agent reinforcement learning, robustness testing, state perturbation, (1 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA230632

2306.06136

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

RegistrationPage

#artificialintelligenceSep-27-2022, 17:16:24 GMT

Patrick St-Amant is the CTO and cofounder of Zetane Systems with advanced education in mathematics. He is the inventor of Zetane's technology and leads the development of Zetane Protector (ML models robustness testing and evaluation) and Zetane Insight Engine (models introspection 3D engine). He has successfully led several end-to-end ML projects with industrial clients and partners in the fields of Security, Defense, Aerospace, Construction, Aviation, Simulation and Manufacturing. This included project scoping, ML solution design, planning, data engineering, implementation, robustness testing and client's interactions. He has spent years as a researcher in number theory, set theory and fundamentals of mathematics.

mathematics, registrationpage, robustness testing

#artificialintelligence

Country:

North America > United States (0.40)
North America > Canada > Quebec > Montreal (0.12)
North America > Canada > Ontario > Toronto (0.08)

Industry: Government > Regional Government > North America Government > United States Government (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Robustness testing of AI systems: A case study for traffic sign recognition

Berghoff, Christian, Bielik, Pavol, Neu, Matthias, Tsankov, Petar, von Twickel, Arndt

arXiv.org Artificial IntelligenceAug-13-2021

In the last years, AI systems, in particular neural networks, have seen a tremendous increase in performance, and they are now used in a broad range of applications. Unlike classical symbolic AI systems, neural networks are trained using large data sets and their inner structure containing possibly billions of parameters does not lend itself to human interpretation. As a consequence, it is so far not feasible to provide broad guarantees for the correct behaviour of neural networks during operation if they process input data that significantly differ from those seen during training. However, many applications of AI systems are security- or safety-critical, and hence require obtaining statements on the robustness of the systems when facing unexpected events, whether they occur naturally or are induced by an attacker in a targeted way. As a step towards developing robust AI systems for such applications, this paper presents how the robustness of AI systems can be practically examined and which methods and metrics can be used to do so. The robustness testing methodology is described and analysed for the example use case of traffic sign recognition in autonomous driving.

neural network, robustness, robustness property, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-030-79150-6_21

2108.06159

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback